20 research outputs found
Improved Finite Blocklength Converses for Slepian-Wolf Coding via Linear Programming
A new finite blocklength converse for the Slepian- Wolf coding problem is
presented which significantly improves on the best known converse for this
problem, due to Miyake and Kanaya [2]. To obtain this converse, an extension of
the linear programming (LP) based framework for finite blocklength point-
to-point coding problems from [3] is employed. However, a direct application of
this framework demands a complicated analysis for the Slepian-Wolf problem. An
analytically simpler approach is presented wherein LP-based finite blocklength
converses for this problem are synthesized from point-to-point lossless source
coding problems with perfect side-information at the decoder. New finite
blocklength metaconverses for these point-to-point problems are derived by
employing the LP-based framework, and the new converse for Slepian-Wolf coding
is obtained by an appropriate combination of these converses.Comment: under review with the IEEE Transactions on Information Theor
Information-Theoretic Bounds on Transfer Generalization Gap Based on Jensen-Shannon Divergence
In transfer learning, training and testing data sets are drawn from different
data distributions. The transfer generalization gap is the difference between
the population loss on the target data distribution and the training loss. The
training data set generally includes data drawn from both source and target
distributions. This work presents novel information-theoretic upper bounds on
the average transfer generalization gap that capture the domain shift
between the target data distribution and the source distribution
through a two-parameter family of generalized
-Jensen-Shannon (JS) divergences; and the
sensitivity of the transfer learner output to each individual sample of the
data set via the mutual information . For ,
the -JS divergence can be bounded even when the support of
is not included in that of . This contrasts the Kullback-Leibler
(KL) divergence -based bounds of Wu et al. [1], which are
vacuous under this assumption. Moreover, the obtained bounds hold for unbounded
loss functions with bounded cumulant generating functions, unlike the
-divergence based bound of Wu et al. [1]. We also obtain new upper bounds
on the average transfer excess risk in terms of the -JS
divergence for empirical weighted risk minimization (EWRM), which minimizes the
weighted average training losses over source and target data sets. Finally, we
provide a numerical example to illustrate the merits of the introduced bounds.Comment: Submitted for conference publicatio
Address-Event Variable-Length Compression for Time-Encoded Data
Time-encoded signals, such as social network update logs and spiking traces
in neuromorphic processors, are defined by multiple traces carrying information
in the timing of events, or spikes. When time-encoded data is processed at a
remote site with respect to the location it is produced, the occurrence of
events needs to be encoded and transmitted in a timely fashion. The standard
Address-Event Representation (AER) protocol for neuromorphic chips encodes the
indices of the "spiking" traces in the payload of a packet produced at the same
time the events are recorded, hence implicitly encoding the events' timing in
the timing of the packet. This paper investigates the potential bandwidth
saving that can be obtained by carrying out variable-length compression of
packets' payloads. Compression leverages both intra-trace and inter-trace
correlations over time that are typical in applications such as social networks
or neuromorphic computing. The approach is based on discrete-time Hawkes
processes and entropy coding with conditional codebooks. Results from an
experiment based on a real-world retweet dataset are also provided.Comment: submitte
Information-Theoretic Generalization Bounds for Meta-Learning and Applications
Meta-learning, or "learning to learn", refers to techniques that infer an
inductive bias from data corresponding to multiple related tasks with the goal
of improving the sample efficiency for new, previously unobserved, tasks. A key
performance measure for meta-learning is the meta-generalization gap, that is,
the difference between the average loss measured on the meta-training data and
on a new, randomly selected task. This paper presents novel
information-theoretic upper bounds on the meta-generalization gap. Two broad
classes of meta-learning algorithms are considered that uses either separate
within-task training and test sets, like MAML, or joint within-task training
and test sets, like Reptile. Extending the existing work for conventional
learning, an upper bound on the meta-generalization gap is derived for the
former class that depends on the mutual information (MI) between the output of
the meta-learning algorithm and its input meta-training data. For the latter,
the derived bound includes an additional MI between the output of the per-task
learning procedure and corresponding data set to capture within-task
uncertainty. Tighter bounds are then developed, under given technical
conditions, for the two classes via novel Individual Task MI (ITMI) bounds.
Applications of the derived bounds are finally discussed, including a broad
class of noisy iterative algorithms for meta-learning.Comment: Accepted to Entrop
Transfer Bayesian Meta-learning via Weighted Free Energy Minimization
Meta-learning optimizes the hyperparameters of a training procedure, such as
its initialization, kernel, or learning rate, based on data sampled from a
number of auxiliary tasks. A key underlying assumption is that the auxiliary
tasks, known as meta-training tasks, share the same generating distribution as
the tasks to be encountered at deployment time, known as meta-test tasks. This
may, however, not be the case when the test environment differ from the
meta-training conditions. To address shifts in task generating distribution
between meta-training and meta-testing phases, this paper introduces weighted
free energy minimization (WFEM) for transfer meta-learning. We instantiate the
proposed approach for non-parametric Bayesian regression and classification via
Gaussian Processes (GPs). The method is validated on a toy sinusoidal
regression problem, as well as on classification using miniImagenet and CUB
data sets, through comparison with standard meta-learning of GP priors as
implemented by PACOH.Comment: 9 pages, 5 figures, Accepted to IEEE International Workshop on
Machine Learning for Signal Processing 202
An Information-Theoretic Analysis of the Impact of Task Similarity on Meta-Learning
Meta-learning aims at optimizing the hyperparameters of a model class or
training algorithm from the observation of data from a number of related tasks.
Following the setting of Baxter [1], the tasks are assumed to belong to the
same task environment, which is defined by a distribution over the space of
tasks and by per-task data distributions. The statistical properties of the
task environment thus dictate the similarity of the tasks. The goal of the
meta-learner is to ensure that the hyperparameters obtain a small loss when
applied for training of a new task sampled from the task environment. The
difference between the resulting average loss, known as meta-population loss,
and the corresponding empirical loss measured on the available data from
related tasks, known as meta-generalization gap, is a measure of the
generalization capability of the meta-learner. In this paper, we present novel
information-theoretic bounds on the average absolute value of the
meta-generalization gap. Unlike prior work [2], our bounds explicitly capture
the impact of task relatedness, the number of tasks, and the number of data
samples per task on the meta-generalization gap. Task similarity is gauged via
the Kullback-Leibler (KL) and Jensen-Shannon (JS) divergences. We illustrate
the proposed bounds on the example of ridge regression with meta-learned bias.Comment: Submitted for Conference Publicatio